Markov Decision Processes with General Discount Functions
نویسندگان
چکیده
In Markov Decision Processes, the discount function determines how much the reward for each point in time adds to the value of the process, and thus deeply a ects the optimal policy. Two cases of discount functions are well known and analyzed. The rst is no discounting at all, which correspond to the totaland average-reward criteria. The second case is a constant discount rate, which leads to a decreasing exponential discount function. However, other discount functions appear in many models, including those of human decisionmaking and learning, making it interesting and possibly useful to investigate other functions. We review results for a weighted sum of several discount functions with di erent cost functions, showing that nite models with this criterion have optimal policies which are stationary from a xed time N, aptly called Nstationary. We review a proof for their existence and an algorithm for their computation, as well as remark on the structure of these policies as the discount factors vary. We then discuss two attempts to generalize the results for weighted exponential discount functions. The rst is a hypothesis for a sum of di erent general discount function with certain exponential bounds, in the spirit of the results for the exponential case. We show via counterexample that despite the intuitive appeal of the hypothesis, it is in fact not true, and make some remarks on why this is so. Our second attempt at generalization is to represent a general discount function as an in nite sum of decreasing exponential functions with constant coe cients. We give convergence conditions on the sum under which the previously established results can be extended to enable us to nd an optimal policy for it. We discuss two examples that clarify our results, and connect them to areas which require non-exponential discount functions. The work is concluded by an example of a model with a monotonic discount function that has no optimal N-stationary policy.
منابع مشابه
Markov decision processes with exponentially representable discounting
We generalize the geometric discount of finite discounted cost Markov Decision Processes to “exponentially representable” discount functions, prove existence of optimal policies which are stationary from some time N onward, and provide an algorithm for their computation. Outside this class, optimal “N-stationary” policies in general do not exist.
متن کاملEventually-stationary policies for Markov decision models with non-constant discounting
We investigate the existance of simple policies in finite discounted cost Markov Decision Processes, when the discount factor is not constant. We introduce a class called “exponentially representable” discount functions. Within this class we prove existence of optimal policies which are eventually stationary—from some time N onward, and provide an algorithm for their computation. Outside this c...
متن کاملAverage Cost Markov Decision Processes with Weakly Continuous Transition Probabilities
This paper presents sufficient conditions for the existence of stationary optimal policies for averagecost Markov Decision Processes with Borel state and action sets and with weakly continuous transition probabilities. The one-step cost functions may be unbounded, and action sets may be noncompact. The main contributions of this paper are: (i) general sufficient conditions for the existence of ...
متن کاملOn the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies
This paper studies convergence properties of optimal values and actions for discounted and averagecost Markov Decision Processes (MDPs) with weakly continuous transition probabilities and applies these properties to the stochastic periodic-review inventory control problem with backorders, positive setup costs, and convex holding/backordering costs. The following results are established for MDPs...
متن کاملSensitive Discount Optimality via Nested Linear Programs for Ergodic Markov Decision Processes
In this paper we discuss the sensitive discount opti-mality for Markov decision processes. The n-discount optimality is a reened selective criterion, that is a generalization of the average optimality and the bias optimality. Our approach is based on the system of nested linear programs. In the last section we provide an algorithm for the computation of the Blackwell optimal policy. The n-disco...
متن کامل